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Abstract 

Given a weighted graph G and an error parameter £ > 0, the graph sparsiflcation problem requires 
sampling edges in G and giving the sampled edges appropriate weights to obtain a sparse graph G £ with 
the following property: the weight of every cut in G £ is within a factor of (1 ± e) of the weight of the 
corresponding cut in G. Benczur and Karger Q showed how to obtain G £ with 0(n\ogn/e 2 ) edges 
in time <9(ralog 3 n) for weighted graphs and O(m\og 2 n) for unweighted graphs using a combinatorial 
approach based on strong connectivity. Spielman et al [22 1 showed how to obtain G £ with 0{n\ogn / e 2 ) 
edges in time 0(m\og c n) for some (large) constant c using an algebraic approach based on effective 
resistances. Our contributions are as below (all for weighted graphs G with n vertices and m edges 
having polynomial-sized weights, unless otherwise stated): 

• Benczur and Karger [2 | conjectured that using standard connectivity instead of strong connectivity 
for sampling would simplify the result substantially, and posed this as an open question. In this 
correspondence, we resolve this question by showing that sampling using standard connectivities 
also preserves cut weights and yields a G £ with 0(nlog 2 n/s 2 ) edges. 

• We provide a very simple strictly linear time algorithm (i.e. 0(m) time) for graph sparsiflcation 
that yields a G £ with 0(nlog 2 n/e 2 ) edges. 

• We provide another algorithm for graph sparsiflcation that yields a G £ with 0(n\ogn/e 2 ) edges in 
0(m\og 2 n) time (for unweighted graphs, this reduces to O(m\ogn) time). 

• Combining the above two results, we obtain the fastest known algorithm for obtaining a G £ with 
0(nlogn/e 2 ) edges; this algorithm runs in time 0(m + nlog 4 n/e 2 ) whereas the previous best 
bound is (3(mlog 3 «). 

• If G has arbitrary edge weights, we give an (9 (m log 2 n) -time algorithm that yields a G £ containing 
0(nlog 2 n/e 2 ) edges. The previous best bound is (9(mlog 3 «) time for a G £ with 0(n\ogn/ e 2 ) 
edges. 

• Most importantly, we provide a generic framework that sets out sufficient conditions for any partic- 
ular sampling scheme to result in good sparsifiers; all the above results can be obtained by simple 
instantiations of this framework, as can known results on sampling by strong connectivity and 
sampling by effective resistances^. 

Our algorithms are Monte-Carlo, i.e. work with high probability, as are all efficient algorithms for graph 
sparsiflcation. 

A key ingredient of our proofs is a generalization of bounds on the number of small cuts in an 
undirected graph due to Karger [8 1; this generalization might be of independent interest. 



with a G £ that is slightly denser than the best-known result for the effective resistance case. 
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1 Introduction 



A cut of an undirected graph is a partition of its vertices into two disjoint sets. The weight of a cut is the sum 
of weights of the edges crossing the cut, i.e. edges having one endpoint each in the two vertex subsets of 
the partition. For unweighted graphs, each edge is assumed to have unit weight. Cuts play an important role 
in many problems in graphs: e.g., the maximum flow between a pair of vertices is equal to the minimum 
weight cut separating them. 

A skeleton G' of an undirected graph G is a subgraph of G on the same set of vertices where each edge in 
G' can have an arbitrary weight. In a series of results, Karger EIQJJ showed that an appropriately weighted 
sparse skeleton generated by random sampling of edges approximately preserves the weight of every cut 
in an undirected graph. This series of results culminated in a seminal work by Benczur and Karger Q 
that showed the following theorem. Throughout this paper, for any undirected graph G and any e G (0, 1], 
(lrbe)Gis the set of all appropriately weighted subgraphs of G where the weight of every cut in the subgraph 
is within a factor of (1 ± e) of the weight of the corresponding cut in G. 

Theorem 1 (Benczur- Karger 0). For any undirected graph G with m edges and n vertices, and for any 
error parameter £ G (0, 1], there exists a skeleton G e containing 0( nl °f n ) edges such that G e G (1 =L e)G 
with high probability^ Further, such a skeleton can be found in 0(mlog 2 n) time if G is unweighted and 
0(mlog 3 n) time otherwise. 

Besides its combinatorial ramifications, the importance of this result stems from its use as a pre-processing 
step in several graph algorithms, e.g. to obtain an 0(« 3 / 2 +/n)-time algorithm for approximate maximum 
flow using the 0(mT/m)-time. algorithm for exact maxflow due to Goldberg and Rao [6]; and more recently, 
0{n 3 / 2 + m)-time algorithms for approximate sparsest cut Ifl2"ll2"0l . 

Subsequent to Benczur and Karger's work, Spielman and Teng |[23l l24l extended their results to pre- 
serving all quadratic forms, of which cuts are a special case; however, the size of the skeleton constructed 
was 0(n\og c n) for some large constant c. Spielman and Srivastava [22] improved this result by constructing 
skeletons of size 0( " °f" ) in 0(m\og 0<y ^ n) time, while continuing to preserve all quadratic forms. Recently, 
this result was further improved by Batson et al [T] who gave a deterministic algorithm for constructing 
skeletons of size O(jz). While their result is optimal in terms of the size of the skeleton constructed, the 
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time complexity of their algorithm is 0('-^-), rendering it somewhat useless in terms of applications. 

Benczur and Karger O, and Spielman et al l|23l 1241 1221 [Tl use contrasting techniques to obtain their 
respective results; the former use combinatorial graph techniques while the latter use algebraic graph tech- 
niques. In each case, the goal is to obtain a probability value p e for each edge e so that sampling each edge 
e independently with probability p e and giving each sampled edge e a weight l/p e yields G e G (1 ±£)G. 
Benczur and Karger [2] choose p e inversely proportional to the strong connectivity of e while Spielman et 
al IT231 l24l l22l [Tl choose p e proportional to the effective resistance of e (both concepts are defined below). 

Definition 1. The strong connectivity of an edge (u,v) in an undirected graph G is the maximum value ofk 
such that there is an induced subgraph G' of G containing both u and v, and every cut in G' has weight at 
least k. 

Definition 2. The effective resistance of an edge (u,v) in an undirected graph G is the effective electrical 
resistance between u and v if each edge in G is replaced by an electrical resistor between its endpoints 
whose electrical resistance is equal to the weight of the edge. 

2 We say that a property holds with high probability (or whp) for a graph on n vertices if its failure probability can be bounded 
by the inverse of a fixed polynomial in n. 
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1.1 Our Results 



We obtain the following results. 

The Generic Framework. We provide a general proof framework as follows. For any given sampling 
scheme (i.e., assignment to the p e 's), we show that if this assignment satisfies two sufficient conditions, then 
the sampling scheme results in good sparsifiers. All of the results stated below are then simple instantiations 
of the above framework, i.e. we show that the sufficient conditions hold. The resulting algorithms are also 
much simpler than those in ||2l or in |[22l [T1l. 

Faster Algorithms. Our first result is an efficient algorithm for constructing a sparse skeleton. 

Theorem 2. Suppose G is an undirected graph with n vertices and m edges. Then, for any fixed £ G (0, 1], 
there is an efficient algorithm for finding a skeleton G e of G having 0( " l °f" ) edges in expectation such that 
G e G (1 ± e)G whp. The time complexity of the algorithm is 0(m + nlog 4 n/e 2 ) if the weights of all edges 
are bounded by a fixed polynomial in n (including all unweighted graphs). 

This is the first sampling algorithm that runs in time strictly linear in m; all previous algorithms had a time 
bound of at least 0(mlog 2 n) for unweighted graphs, and 0(mlog 3 n) for weighted graphs. This algorithm 
improves the time complexity of several problems, where creating a graph sparsifier in the first step. We 
mention some of these applications. 

• This yields an 0(m) + 0(« 3 / 2 /£ 3 )-time algorithm for finding the £-approximate maximum flow be- 
tween two vertices of an undirected graph using the exact maxflow algorithm in [6]. The previous 
best algorithm had a running time of 0(m\og 3 n) + (5(n 3 / 2 /£ 3 ). 

• This yields an 0(m) + (5(rc 3 / 2 )-time algorithm for finding an 0(log«)-approximate sparsest cut 021 
l20l . and an 0(m) + 0(n 3 ^ 2+8 ) -time algorithm for finding an 0(^log ^-approximate sparsest cut for 
any constant 8 ll20l . The previous best algorithms had running time of 0(mlog 3 n) + 0(n 3 l 2 ) and 
0(mlog 3 ?i) + 0(n 3 l 2+s ) respectively. 

The sampling algorithm in Theorem |2] is obtained by composing two different algorithms described below. 
The first algorithm is fast but generates a slightly denser skeleton. The second (slower) algorithm then 
operates on this skeleton to obtain a smaller skeleton. 

Theorem 3. Suppose G is an undirected graph with n vertices and m edges. Then, for any fixed e G (0, 1], 

there is an efficient algorithm for finding a skeleton G e of G having 0( " lo f - ) edges in expectation such that 
G e G (1 =t e)G whp. The time complexity of the algorithm is 0{m) if the weights of all edges are bounded 
by a fixed polynomial in n (including all unweighted graphs), and OimXog 2 n) if the edges have arbitrary 
weights. 

Theorem 4. Suppose G is an undirected graph with n vertices and m edges. Then, for any fixed £ G (0, 1], 
there is an algorithm for finding a skeleton G e of G having 0( nl °f - ) edges in expectation such that G e G 
(1 ± e)G whp. The time complexity of the algorithm is 0(mlogn) for unweighted graphs and 0(m\og 2 n) if 
the weights of all edges are bounded by a fixed polynomial in n (including all unweighted graphs). 
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Sampling by Standard Connectivity, Effective Resistances and Strong Connectivity. In proving Theo- 
remQ] the authors had to use strong connectivity because the more natural notion of standard connectivities 
seemed to pose complications. 

Definition 3. The standard connectivity, or simply connectivity, of an edge (u,v) in an undirected graph G 
is the maximum flow between u and v in G. 

The authors conjectured that using standard connectivity instead of strong connectivity for sampling would 
simplify the result substantially, and posed this as their main open question. In this correspondence, we 
resolve this question by showing that sampling using standard connectivities also preserves cut weights. 

Theorem 5. Suppose G is an undirected graph on n vertices. For any fixed £ £ (0, 1], let G e be a skeleton 
of G formed by sampling edge e in G with probability p e = min ( 96 ^ ^ "\ ln n , 1), where k e is the standard 
connectivity of edge e in G. If selected in the sample, edge e is given a weight of\/p e in the skeleton. Then, 
G e has 0( nl °f n ) edges in expectation and G e G (1 ± e)G whp. 

Observe that the size of the skeleton constructed using standard connectivity has an extra log n factor com- 
pared to that constructed using strong connectivity. We conjecture that this factor can indeed be removed by 
more careful analysis. 

We show that exactly the same proof as above holds if we replace standard connectivity with effective 
resistance of an edge. Thus, we show that sampling edges using effective resistances also produces a sparse 
skeleton that approximately preserves all cut weights, a result independently obtained by Spielman and 
Srivastava recently for the larger class of all quadratic forms (cuts are a special type of quadratic forms) with 
a tighter bound on the size of the skeleton J22J. Our result, though weaker, has a much simpler proof. 

We also show that the results obtained in O using strong connectivity can be obtained as a simple 
instantiation of our general sampling framework. 

Generalizations of Cut Counting. The edge connectivity of an undirected graph is the minimum weight 
of a cut in the graph. A key ingredient in the proof of Theorem[T]is a celebrated theorem due to Karger (H) 
that gives tight bounds on the number of distinct cuts of a fixed weight in an undirected graph in terms of 
the ratio of the weight of the cuts to the edge connectivity of the graph. 

Theorem 6 (Karger (H). For an undirected graph with edge connectivity c and for any a > 1, the number 
of cuts of weight at most ac is at most 0(n 2a ). 

While this theorem is extremely useful in bounding the number of small cuts in an undirected graph (e.g. 
in sampling (9JQI3H1, network reliability ifTTl . etc.), it does not shed any light on the distribution of edges 
according to their connectivities in cuts. We generalize the above theorem and show that though there may 
be many distinct cuts of a fixed large weight in a graph, there are a small number of distinct sets of edges in 
these cuts if we restrict our attention to only edges with large (standard) connectivity. To state our theorem 
precisely, we need to introduce the notion of k-heavy and k-light edges, and that of the k-projection of a cut. 

Definition 4. An edge is said to be k-heavy if it has connectivity at least k, and k-light otherwise. The 
k-projection of a cut is the set of k-heavy edges in the cut. 

Since every edge has connectivity at least c, Theorem[6]can be interpreted as bounding the number of distinct 
/c-projections of cuts of size ak by 0(n 2a ) for k = c. We generalize this result to arbitrary values of k. 

3 ln« = log e n;lgn = log 2 ;i. 
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Theorem 7. For any undirected graph with edge connectivity c and for any k>c and any a > 1, the number 
of distinct k-projections of cuts of weight at most <xk is at most n . 

We believe this theorem will be of independent interest. 

Roadmap. In section |2] we describe our generic sampling framework, and provide one example of instan- 
tiating this framework that proves Theorem [3] for the unweighted case. In section [3] we prove Theorem |7] 
and use it to prove Theorem [8] the main framework theorem stated in section |2] In section 0J we give 
two sampling algorithms for graphs with polynomial edge weights: the first algorithm constructs skeletons 
containing 0( n °| " ) edges in expectation and has time complexity 0(m), thus proving Theorem [3] for the 
polynomial weights case; the second algorithm constructs skeletons containing 0( " l °f " ) edges in expecta- 
tion and has time complexity 0{m\ogn) for unweighted graphs, and 0(m\og 2 n) for graphs with polynomial 
edge weights, thus proving Theorem |4] Combining these two theorems proves Theorem (2) In section [5j 
we prove Theorem [5] and show that results on sampling by effective resistances and sampling by strong 
connectivities can also be derived from our framework. Finally, in section [6] we give a sampling algorithm 
for graphs with arbitrary edge weights that constructs skeletons containing 0( " lo f " ) edges in expectation 
and has time complexity 0(mlog 2 n), thus proving Theorem[3]for the arbitrary weights case. 

2 The Generic Framework 

We describe a generic sampling framework — each of our individual sampling schemes is obtained by a 
particular setting of parameters of this generic framework. 

Suppose G = iy,E) is an undirected graph where edge e G E has weight w e . We will assume throughout 
that w e is a positive integer. Let Gm = (V,Em) denote the multi-graph constructed by replacing each edge e 
by w e unweighted parallel edges e\,e2,.. -,e We . Consider any e € (0,1]. We construct a skeleton G e where 
each edge eg G Em is present in graph G e independently with probability p e , and if present, it is given a 
weight of l/p e . (For algorithmic efficiency, observe that an identical skeleton can be created by assigning 
to edge e a weight of R e / p e where R e is generated from the binomial distribution B(w e ,p e ); this can be done 
in time 0(w e p e ) rather than time 0(w e ) (see e.g. Q)). 

What values of p e result in a sparse G e that satisfies G e G (1 ± e)G? Let p e = min( Q 9 ^ n e " , 1), where 
a is independent of e and X e is some parameter of e satisfying X e < 2" — 1. The exact choice of values for 
a and the A e 's will vary from application to application. However, we describe below a sufficient condition 
that characterizes a good choice of a and A e 's. 

To describe this sufficient condition, partition the edges in Gm according to the value of X e into sets 
F ,Fi,...,F k where k = [\grmx eeE {Xe}\ <n-\ and e t G Fj iff V < X e < 2 i+l - 1. Now, let ^ = 
Go,G\,G2, ■ -,Gj = (V,Ej),. . . ,Gk be a set of subgraphs of Gm (we allow edges of Gm to be replicated 
multiple times in the G,s) such that Fj C E, for every i. W is said to be a (7T, a)-certificate corresponding to 
the above choice of a and A e 's if the following properties are satisfied: 

^-connectivity For i > 0, any edge e? G F[ is 7r-heavy in G\. 

(c) 

a-overlap For any cut C containing c edges in Gm, let e) be the number of edges that cross C in Gf. Then, 

for all cuts C, £f =0 < ac. 

Theorem [8] describes the sufficient condition; its proof appears later in section [3] The intuition for this 
proof is as follows. Consider all cuts C in Gm', restrict each cut to just the edges in F,- (we do this because 
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edges in F, have roughly the same sampling probabilities, which enables an easy application of Chernoff 
bounds). How many such distinct /^-restricted cuts are there? Organize all cuts C in Gm into doubling 
categories, each comprising cuts with roughly equal values of e\ ; now using Theorem [7] as applied to G, 

CO 

and the ^-connectivity property above, we can conclude that this count is «°( e < ' x > per category. Next, for 
a particular cut C and its ^ -restriction, we need to apply an appropriate Chernoff bound with a carefully 

chosen deviation-from-expectation parameter so that this deviation has probability at most n~^ e < l n >; this 
probability offsets the above count, thereby allowing us to claim that this deviation holds for all cuts in one 
doubling category (and the number of categories is not too many, so the same fact extends across categories 

(C) ( _] 

as well). The actual value of this deviation comes out to be 0(e) ■ • The a-overlap property now 
allows us to bound the sum of this deviation over all i, < i < k, by ec, as required. 

Theorem 8. If there exists a (k, a) -certificate for a particular choice of a and X e 's , then the skeleton 
G e G (1 ± e)G with probability at least 1 — 4/n. Further G e has 0( al ° 2 g " £ eG£ edges in expectation. 

2.1 A Simple Algorithm for Unweighted Graphs 

We show how we can instantiate the above framework with specific values of a, Ag's to obtain a very simple 
sampling algorithm that runs in 0{m) time and obtains a skeleton of size 0(2^-5). This proves Theorem [3] 
for the unweighted case. 

In order to present our sampling algorithm, we need to define the notion of spanning forests . As earlier, 
G denotes a graph with integer edge weights w e for edge e and Gm is the unweighted multi-graph where e 
is replaced with w e parallel unweighted edges. 

Definition 5. A spanning forest T of Gm (or equivalently of G) is an (unweighted) acyclic subgraph of G 
satisfying the property that any two vertices are connected in T if and only if they are connected in G. 

We partition the set of edges in Gm into a set of forests Ti,Tz,... using the following rule: 7} is a spanning 
forest of the graph formed by removing all edges inTi,T2,..., Tj-ifrom Gm such that for any edge e E G, all 
its copies in Gm appear in a set of contiguous forests 7^,7^+1) • . . ,T( e+We -i. This partitioning technique was 
introduced by Nagamochi and Ibaraki in [ 19 ], and these forests are known as Nagamochi-Ibaraki forests (or 
NI forests). The following is a basic property of NI forests. 

Lemma 1 (Nagamochi-Ibaraki 11191 1181 ). For any pair of vertices u,v, they are connected in NI forests 
T\,T2,..., Tu u>v \ for some k(u,v) and not connected in any forest Tj, for j > k(u, v). 

Nagamochi and Ibaraki also gave an algorithm for constructing NI forests that runs in 0[m + n) time if Gm 
is a simple graph (i.e. G is unweighted) and 0(m + n\ogn) time otherwise [ T91IT81 . Note that our sampling 
schemes are relevant only when m > nlogn; therefore, the NI forests can be constructed in 0(m) time for 
all relevant input graphs. 

We set X e to the index of the NI forest that e appears in, and set a = 2 and % = 2 l ~ 1 . For any i > 0, 
let Gi contain all edges in NI forests r 2 , i, r 2 , i +1 , . . . , r 2 ,+i_ 1 ; let Go = Fq = T\. Each edge in Fj appears 
exactly once in G,-, once in G, + i, and does not appear at all in any of the other G 7 's, j ^ i,i+ 1. This proves 
a-overlap. Further, for any edge e € F, i > 0, Lemma Q] ensures that the endpoints of e are connected in 
each of T 2 i-i , T^-'+i , . . . , r 2 i_ 1 . It follows that e is 2 ! ~ 1 -heavy in G,-, thereby proving 7T-connectivity. We can 
now invoke Theorem [8] and conclude that this sampling scheme results in G e G (1 ± e)G with probability at 
least 1 — 4/n. It remains to bound the number of edges in G e , as follows. 
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Since w e = 1 for each edge e and the total number of NI forests K is at most n 2 , we have 

It £ = It- = f if = II 1 <(«-l)I 1 = ^log^) = 0(nlogn). 

£>££ A£> ee£ Ae j=leeTj A e j=\eeTjJ j=l J 

It follows from Theorem [8] that G e has 6>( " lo f - ) edges. 

The time complexity for constructing the NI forests is 0(m) and that for sampling is 0(1) per edge 
giving another 0(m); so overall, the algorithm takes 0(m) time. 

3 Proofs of Main Theorems 

In this section, we will first prove Theorem |7J and then use it to prove Theorem [8j Let us start by defining 
&-heavy and &-light vertices. 

Definition 6. A vertex in an undirected graph is said to be k-heavy if at least one edge incident on the vertex 
is k-heavy; otherwise, the vertex is said to be k-light. 

We need the following property of £-heavy vertices. 

Lemma 2. The sum of weights of edges incident on a k-heavy vertex is at least k. 

Proof. For any &-heavy vertex v, there exists some other vertex u such that the maxflow between u and v is 
at least k. Thus, any cut separating u and v must have weight at least k; in particular, this holds for the cut 
containing only v on one side. □ 

Suppose G is an any weighted undirected graph. We scale up the weights of all edges in G uniformly until 
the weight of every edge is an even integer; call this graph G s . We replace each edge e = (w,v) of weight w e 
in Gs with w e parallel unweighted edges between u and v to form an unweighted multi-graph Gm- Clearly, 
any cut in Gm has an even number of edges. Theorem |7] holds for any value of k in G if and only if it holds 
for any even integer k in Gm- Therefore, it suffices to prove Theorem|7]for all even integers k on unweighted 
multigraphs where the weight of every cut is even. We also assume that Gm is connected; if not, the theorem 
holds for the entire graph since it holds for each connected component. 

We introduce two operations on undirected multigraphs: spitting -off and edge contraction. The splitting- 
off operation was introduced by Lovasz in |[T3l[T4l (ex. 6.53): 

Definition 7. A pair of edges (s,u) and (u,t) are said to be split-off in an undirected multigraph if they are 
replaced by a single edge (s,t). 

Various properties of the splitting-off operation have been explored [l5][T6l[5l|25l. ^ e neec * tne f°ll° w i n g 
property. 

Definition 8. For any k > 0, a splitting-off operation is said to be fc-preserving if all edges in the graph 
(except those being split-off) that were k-heavy before the splitting-off continue to be k-heavy after the 
splitting-off. 

The following lemma is a corollary of a deep result of Mader [ 15") for splitting-off edges while maintaining 
the maxflows of pairs of vertices; however, we give a much simpler direct proof of this lemma here. 
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Lemma 3. Suppose Gm is an undirected multigraph where every cut contains an even number of edges. Let 
k > be any even integer. Then, for any k-light non-isolated vertex u in Gm, there exists a pair of edges 
(s,u) and (u,t) such that splitting-off this pair is k-preserving. 

Proof. We will prove that for every edge (s,u), there exists an edge (u,t) such that splitting-off this pair of 
edges retains the following property: any pair of vertices x,y that were k-connected (i.e. had a maxflow of 
at least k) before the splitting-off continue to be so after the splitting-off. We define a k-separator to be any 
cut that separates at least one pair of ^-connected vertices, and call a /c-separator with exactly k edges a tight 
cut. Since all cuts have even number of edges and the weight of a cut can decrease by at most 2 due to a 
splitting-off operation, we only need to ensure that we do not decrease the number of edges in any tight cut 
when we split-off a pair of edges. 

Suppose there exists no edge (u,t) such that splitting-off (s,u) and (u,t) retains the &-heavy property for 
all &-heavy edges. Then, for every neighbor t (other than s) of u, there exists at least one tight cut having s,t 
on one side and u on the other. Consider a minimum-sized collection of tight cuts X\,X 2 , ■ ■ ■ ,Xi, where X,- is 
the subset of vertices on the side of the cut not containing u. If £ = 1, moving u to the side of X\ produces a 
^-separator containing less than k edges, which is a contradiction. Thus t > 2. Now,let 

A =X 1 HX 2 ;B =X 1 \X 2 ;C = X 2 \Xx;D = V\(X 1 UX 2 ). 

Then, s G A and u£D. Since Xi and X 2 are ^-separators, either (1) A and D are ^-separators, or (2) B and 
C are k separators. In either case, this pair of /c-separators must be tight cuts since they contain at least 
k edges each being ^-separators and at most k edges each because their total number of edges is at most 
that of X\ and X 2 . If A and D are tight cuts, we can replace cuts Xi and X 2 by D in the collection of tight 
cuts, contradicting minimality of this collection. On the other hand, if B and C are tight cuts, the counting 
argument also shows that there is no edge between A and D, contradicting the existence of edge (s, u). □ 

Let us now extend the notion of splitting-off to vertices. 

Definition 9. A vertex with even degree in an undirected graph is said to be split-off if a pair of edges 
incident on it is repeatedly split-off until the vertex becomes isolated. Splitting-off of a vertex is said to be 
/c-preserving if each constituent edge splitting-off is k-preserving. 

Note that the number of edges in a cut either stays unchanged or decreases by 2 after a splitting-off operation. 
Thus, if every cut in the graph had an even number of edges to start with, then each cut continues to have 
an even number of edges after a sequence of splitting-off operations. Therefore, the following lemma is 
obtained by repeatedly applying Lemma [3] to a fc-light vertex. 

Lemma 4. Suppose Gm is an undirected multigraph where the number of edges in every cut is even. Let k 
be an even integer. Then, there exists a k-preserving splitting-off of any non-isolated k-light vertex u in Gm- 

Our second operation is edge contraction. 

Definition 10. Contraction of edge e = (u,v) in an undirected multigraph G is defined as merging u and v 
into a single vertex (i.e. all edges incident on either uorv are now incident on the new vertex instead). Any 
self-loops produced by edges between u and v are discarded. 

We will now prove Theorem |7J 

Proof of Theorem^ We run the following randomized algorithm on multigraph Gm'- 
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1. Split-off all Might vertices ensuring the ^-preserving property (Lemma|4|. 



2. Contract an edge chosen uniformly at random in the resulting graph. 

3. If the contraction produces a &-light vertex, split it off@ 

4. If < 2a vertices are left, output a random cut; otherwise, go to step|2] 

Consider a cut C that has at most ak edges; let its ^-projection be S. In any of the splitting-off operations, no 
edge in S can be split-off since these edges continue to be &-heavy throughout the execution of the algorithm. 
So, if no edge crossing cut C (either an edge in Gm or one produced by the splitting-off operations) is 
contracted during the execution of the algorithm, then all edges in S survive till the end. To estimate the 
probability that no edge crossing cut C is contracted, let hj be the number of vertices left at the beginning 
of the 7th iteration. Thus, hi is the number of fc-heavy vertices in Gm (note that all &-light vertices are 
split-off initially), and hj + \ is either hj — 1 or hj — 2 depending on whether a vertex was split-off in step[3]of 
iteration j. Observe that the number of edges crossing C cannot increase due to the splitting-off operations. 
Further, Lemma [2] asserts that at the beginning of iteration j, there are at least hjk/2 edges in the graph. 
Thus, the probability that no edge in C is selected for random contraction in step [2] of iteration j is at least 
1 — j^£j2 = ^~7T- Then, the probability that no edge crossing C is contracted in the entire execution of the 
algorithm is at least 

n(-M'(.-fH; ' 

Since there are 2 2a ~ 1 cuts in a graph with 2a vertices, the probability that the random cut output by the 
algorithm contains only edges crossing cut C (and therefore S is exactly the set of &-heavy edges in Gm 
output by the algorithm) is at least L" ) 1 2 1 ~ 2a > n~ 2a . This is true for every distinct ^-projection of cuts 
having at most ak edges; hence, the total number of such ^-projections is at most n 2a . □ 

In addition to the above theorem, we need the following non-uniform version of Chernoff bounds (for 
Chernoff bounds, see e.g. ifTTl ) to prove Theorem [8] (A proof of this theorem is given in the appendix.) 

Theorem 9. Consider any subset C of unweighted edges, where each edge e G C is sampled independently 
with probability p e for some p e G [0, 1] and given weight l/p e if selected in the sample. Let the random 
variable X e denote the weight of edge e in the sample; if e is not selected in the sample, then X e = 0. Then, 
for any p such that p < p e for all edges e, any e G (0, 1], and any N > \C\, the following bound holds^ 



\Y, x e-\C\ \ >£N 



<2e 



-038e 2 pN 



We will now use Theorem [7] to prove Theorem [8] (We re-use the notation defined in section |2]) For any cut 

C in G M , let = F t nC and e\ C) = E t nC for < i < Jfe@ let ff ] = \fP \ and ef ] = \e\ C) \. Also, let ff ] 

(c) 

be the expected weight of all edges in iy ' in the skeleton graph G e . We first prove a key lemma. 



4 If an edge between u and v is contracted in step[2] all edges that were previously £-heavy continue to be so after the contraction, 
except the edges between u and v. So, at most one vertex (the new vertex) becomes A>light as a result of this contraction. 
5 For any event S, V\S\ represents the probability of event S. 

6 For any cut C and any set of edges Z, ZHC denotes the set of edges in Z that cross cut C. 
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Lemma 5. For any fixed i, with probability at least 1 — \, 



for all cuts C in Gm- 



(c) 

Proof. By the ^-connectivity property, any edge e G F\ is n -heavy in G, for any i > 0. Therefore, e) > n. 
Let 'tfij be the set of all cuts C such that iQ? < e ; - < 7r2 ;+1 — 1, j > 0. We will prove that with probability 
at least 1 — 2n~ 2J+> , all cuts in %j satisfy the property of the lemma. Then, the lemma follows by using the 
union bound over j (keeping i fixed) since In- 2 + 2n~ 4 + ...+ 2n- 2j + ...< An- 2 . 



(c) 

We now prove the above claim for cuts C G Ify. Let X> denote the set of edges in F^ 1 that are sampled 



.(c) 



with probability strictly less than 1 ; correspondingly, let x[ 



(c) 



(c) (c) 
\X- | and let x) be the total weight of 



edges in in the skeleton graph G e . Since edges in F^' \X-^' have a weight of exactly 1 in G e , it is 



(c) x v (c) 



-j,- | < (|) max 



sufficient to show that with probability at least 1 — 2« 
cuts C G ^y. Since each edge e G has A e < 2' +1 , we can use Theorem [9] with the lower bound on 



2 " ,xj C) ) for all 



probabilities p = ^"m"^ ■ There are two cases. In the first case, suppose x^ < Cj - 
where C G by Theorem |9j we have 



. Then, for any X ; 



(c) 



(C) JC) 



>, 2 



7ra 



<2<? 



-0.38V 



0.38-2'+ 'e- , 



< 2e~ 



<2e 



-6-2 J Inn 



since e\ > 7i2 J for any C G ^y. In the second case, suppose x) ' > ' %a . Then, for any X- where 
C G %j, by Theorem |9l we have 



r (C) - r (C) 



n , 8 E 2 /_9telM \JC) 6^ 'inn . 

< 2^ a38 ~l^ 7TT ^i^' < 2e~^" < 2e- 6 2 la ", 



(C) e' C '2'~' 

since x\ > ' — > - for any C G < ©,y. Thus, we have proved that 



> | - I max 

2/ \ na 



(c) 



<2e - 6 - 2 ^« = 2n- 6 - 2J 



(c) 

for any cut C G ^-y. Now, by the n -connectivity property, we know that edges in F- , and therefore those 



in , are 71 -heavy in G,. Therefore, by Theorem |7J the number of distinct Xf j) sets for cuts C G ^-y is 



(0 



at most « 



n 4 V . Using the union bound over these distinct X^' edge sets, we conclude that with 



(c) 



probability at least \ — 2n 2J ' , all cuts in %i satisfy the property of the lemma. 



□ 



We now use the above lemma to prove Theorem [8] 



Proof of Theorem^ For any cut C in Gm, let c be the number of edges in C; correspondingly, let c be the 
total weight of the edges crossing cut C in the skeleton graph G e . Since k < n — 1, we apply the union bound 
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to the property from Lemma [5] over the different values of i to conclude that with probability at least 1 — |, 
we have £* =0 \fP ~ fP I < I?=o (!) max (^-^jf^J for a11 cuts c in g m- Then, with probability at 
least 1 - A -, 

l^-^iE^-E/WisEi^-^isfEmax )<f E^+E/fU 

i=0 1=0 i=0 z i=0 \ ;tu / \'=0 i=0 / 

since £* =0 ' — < c by the a-overlap property and Y%=ofi < c since 7y s form a partition of the edges 
in C. 

We now prove the size bound on G e . The expected number of distinct edges in G e is 

£i-(i-p e ) w '<J> e p e . 

The bound follows by substituting the value of p e . □ 

4 Sampling in Graphs with Polynomial Edge Weights 

In this section, we will give an algorithm for sampling in undirected weighted graphs, where the weight of 
every edge is an integer bounded by n d for a fixed constant d > 0. The algorithm constructs a skeleton graph 
containing 0( n °f " ) edges in expectation and has time complexity 0{m+ f ). Our strategy, as outlined 

in the introduction, has two steps: first we run an algorithm that constructs a skeleton graph with 0( " lo § " ) 
edges in expectation and has time complexity 0(m); then, we run a different algorithm that constructs a 
sparser skeleton containing 0( "'°f " ) edges in expectation on the skeleton graph constructed in the first 

step. The second algorithm takes time 0{m\og 2 n) on a graph with m edges and therefore 0( " lo | " ) time 
on the skeleton graph produced in the first step. To ensure that the final skeleton graph is in (1 ± e)G, we 
choose £ /3 as the error parameter for each algorithm. As an additional observation, we show that the time 
complexity of the second algorithm improves to 0(m log n) if its input graph is unweighted. 

We will describe both these algorithms for an input graph G, where the weight w e of every edge e is 
an integer bounded by n d for a fixed constant d > 0. Note that the input graph to the second algorithm in 
the above two-step sampling scheme may have fractional weights. However, we can scale up all weights 
uniformly until they are integral, and the scaled weights continue to be bounded by some fixed polynomial 
in n. Once the skeleton graph is obtained, we scale all weights down uniformly to obtain the final skele- 
ton graph. The unweighted multigraph constructed by replacing each edge e with w e parallel unweighted 
edges et,e2, ■ ■ ■ ,e We between u and v is denoted by Gm- Also, T\,Ti,... denotes a set of NI forests of 
Gm', edge ej appears in forest 7] e+ y_i, where 1 < j < w e . Thus, the copies of edge e appear in NI forests 
Ti e ,Ti e+ i, . . . , 7/ e+We _i. For both algorithms, we will use the generic sampling scheme described in section|2] 

Algorithm for Step 1. For any edge e = («,v), we choose X e = i e + w e — 1, i.e. the index of the last NI 
forest where a copy of e appears; also set a = 2 and % = 2'~ x . For any i > 1, define G,- to be the graph 
containing all edges in NI forests T 2 >-\ ,r 2 ,-i +1 , . . . ,r 2 ,_ 1 (call this set of edges F,) and all edges in F(, i.e. 
all edges e with 2' < X e < 2' +1 — 1. Let Go only contain edges in Fo. For any i ^ j, FiHFj = YidYj = 0; 
thus, each edge appears in G, for at most two different values of i, proving a-overlap. Further, for any edge 
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e G Fj, Lemma Q] ensures that the endpoints of e are connected in each of T 2 ,-i , . . . , r 2 ,_ 1 . It follows 

that e is 2'~ 1 -heavy in G,-, thereby proving K -connectivity. 

We now prove the size bound. For any edge e' £ Em, let t{e') be the index of the NI forest it appears in. 
Then, 

£rll - / _, ^II^--rr = I 7T7T = I I ^<(n-l)£^ = 0(nlog^) = 0(nlog«), 

where the last step follows from the observation that the total number of NI forests K is at most n d+2 , 
where d is a constant. Using Theorem [U we conclude that the skeleton graph G e constructed by the above 
algorithm has 0( " lo f " ) edges in expectation and is in (1 ± e)G whp. 

Time Complexity. The time complexity for constructing the NI forests, and therefore figuring out p e 
values is 0{m + n\ogn). We sample each edge e by setting its weight in the skeleton G e to r e j p e , where r e 
is drawn randomly from the Binomial distribution with parameters w e and p e . This is clearly equivalent to 
the sampling scheme described above, and can be done in w e p e expected time for each edge e (see e.g. 1171). 

nlo ~n n\o 2 n 

and therefore 0( - °f - ) time overall. Since m > - °f " for this algorithm to be invoked, the overall time 
complexity of the algorithm is 0(m). 

Algorithm for Step 2. Before describing our second sampling algorithm, we define the following opera- 
tion on graphs. (Recall the definition of edge contraction given in section[3]) 

Definition 11. Let G = (V,E) be an undirected graph, and let Vi,V2, ■ ■ ■ ,Vk be a partition of the vertices 
in G such that for each V„ the induced graph of G on Vj is connected. Then, shrinking G with respect to 
V\,V2, ■ ■ ■ ,Vk produces the graph formed by contracting all edges between vertices in the same Vjfor all i. 

Our sampling algorithm uses our generic sampling scheme where X e is determined using the following al- 
gorithm. Here H c = (V C ,E C ) is a graph variable representing a weighted graph. The algorithm is described 
recursively; we call SetLambda(G,0) to execute it. 

SetLambda(//, z) 

1. Set H C = H 

2. If total weight of edges in E c is at most \V C \ • 2 ,+1 , then 

(a) Set X e = 2' for all edges e G E c 

(b) Remove all edges in E c from H; suppose H splits into connected components H\ ,H2, ■ . ■ ,Hk 

(c) For each Hj containing at least 2 vertices, call SetLambda(// ; -, i + 1) 

Else, 

(a) Construct 2' + 1 NI forests 7\, T2, ■ ■ ■ , T 2 t + \ for H c 

(b) Shrink H c wrt the connected components in r 2 ; +1 ; update V c and E c accordingly 

(c) Go to step [2] 
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Also, set a = 4 and % = 2 k where k = |_lgmax ee £{A. e }J . For any r, recall that F r contains all w e unweighted 
copies of edge e from Gm, where e satisfies T~ < X e < 2 r+l — 1. For any i > 1, let G, contain all edges in 
F r for all r > i — 1, where each edge in F r is replicated 2 i ~'" +1 times in G,; let Go contain edges of Fq where 
each edge is replicated 2 k times. We need the following lemma to prove that ^-connectivity is satisfied. 

Lemma 6. For any j > 1, consider any edge e £ Fj, i.e. an edge e for which the above algorithm sets 
X e = 2 } . Then, e is 2 y_1 -heavy in the graph U,->j-\F r . 

Proof. For any edge e inFj, let G e = (V e ,E e ) be the component of G containing e such that SetLambda(G e ,7'- 
1) was executed. We will show that e is 2 y_1 -heavy in G e ; since G e is a subgraph of G, the lemma follows. 
In the execution of SetLambda(G e , j — 1), there are multiple shrinking operations, each of them comprising 
the contracting of a set of edges. We claim that any such contracted edge is 2 y_1 -heavy in G e ; it follows that 
any two vertices u and v that got shrunk into a single vertex are 2 y_1 -connected in G e . 

Let G e have k shrinking phases; let the graph produced after shrinking phase r be G er . We now prove 
that all edges contracted in phase r must be 2 ;_1 -connected in G e by induction on r. For r = 1, since e 
appears in the (2 i_1 + l)st NI forest of phase 1, e is 2 i_1 -connected in G e . For the inductive step, assume 
that the property holds for phases 1,2, ...,r. Any edge that is contracted in phase r + 1 appears in the 
(2 ; ~ 1 + l)st NI forest of phase r + 1; therefore, e is 2 7_1 -connected in G e r . By the inductive hypothesis, all 
edges of G e contracted in previous phases are 2 ;_1 -heavy in G e ; therefore, an edge that is 2 ;_1 -heavy in G e r 
must have been 2^ 1 -heavy in G e . □ 

Consider any cut C in G containing an edge e G F, for any i > 0. Let the corresponding cut (i.e. with the 
same bipartition of vertices) in G, be C,. We need to show that the number of edges in Q is at least 2 k to 
prove TT-connectivity. If i = 0, e is replicated 2 k times in Go thereby proving the property. For i > 1, let the 
maximum X a of an edge a in C be k c , where V < k c < 2 j+1 - 1 for some j > i. By the above lemma, C, 
contains at least 2 ;_1 distinct edges of G, each of which is replicated at least 2 i_;+1 times. Thus, C,- contains 
at least 2 edges. 

(C) (O 

We now prove a-overlap. For any cut C, recall that f- and e\ respectively denote the number of 
edges in F{ n C and in C,- (where C, is as defined in the previous paragraph) respectively. Then, 

k J c )ii-i J c ) k JcW-i AcU k k ,(c) 9j fc_ r+ i 9i _i AC) k k Ac) 
^ 271 ^ ^ " 2^ +1 ^ 2* 2 ^ 2 r ^' 

ft rfl f (C) ft r+l , ft 

r=0/=l Z r=l i=l Z r=l 

Define D,- to be the set of connected components in the graph G \ {Fq U Fi U . . . U F^ \ ) for any i > 1; let 
Do be the single connected component in G. For any / > 0, if any connected component in D, remains intact 
in D, + i, then there is no edge from that connected component in F(. On the other hand, if a component in D, 
splits into v\ components in A+i, then the algorithm explicitly ensures that the number of edges in Fj from 
that connected component is at most rj2 ,+1 . Since each such edge has X e = h, the contribution of these 
edges to the sum Y&eE Y' ^ s at most 2?7 < 4(t/ — 1) (since tj > 2). But, r\ — 1 is the increase in the number 
of components arising from this single component. Therefore, if d[ = |D, |, then 

£^ < £4(^+1 -4()<4« 

e A e j=0 

since ultimately we have n singleton components. Using Theorem [8j we conclude that the skeleton graph 
G e constructed by the above algorithm has G( " lo f - ) edges in expectation and is in (1 ± e)G whp. 
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Time Complexity. We show below that the algorithm to find values of X e can be implemented in 0{m logn) 
time for unweighted graphs, and 0(mlog 2 n) time for graphs with polynomial edge weights. Once we have 
obtained the sampling probabilities, we use the same trick as in the previous algorithm, i.e. sample from a 
Binomial distribution, to produce the skeleton in 0( nl °f n ) additional time. Since the algorithm is invoked 
only if m > " °f " , the total running time is 0{m\ogn) if G is unweighted and 0(m\og 2 n) otherwise. 

We now determine the time complexity for finding the values of X e . Consider one call to SetLambda (H , i ) 
which begins with H = (V,E) and let H c = (V C ,E C ) denote the graph H as it evolves over the various iter- 
ations in this procedure. Each iteration of steps (a) and (b) in the else block takes 0(|V C | logrc + \E C \) 
time. We show that the number of vertices halves in each iteration (save the last) and therefore the to- 
tal time over all iterations is 0(|V|logn + |F|logn). Since we are dealing with the case of polynomial 
edge weights, the depth of recursion is O(logn). Therefore, over all recursive calls, the time comes to 
0(nlog 2 n + mlog n) = 0(m\og 2 n). 

To see that the number of vertices halves from one iteration to the next, consider an iteration that begins 
with E c having weight at least \V C \ -2 ,+1 . E c for the next iteration (denoted by E' c ) comprises only edges in 
the first 2' NI forests constructed in the current iteration. So the total weight of edges in E' c is at most | V c \ ■ 2'. 
If this is not the last iteration, then this weight exceeds |V C '| • 2 !+1 . It follows that |V C '| < |V c |/2, as required. 

From the above description, note that for the unweighted case, \E' C \ < \E c \/2, and therefore the time taken 
over all iterations in one recursive call is 0(\V\ + \E\). Over all recursive calls this comes to 0{m\ogn). 

5 Sampling Schemes using various Connectivity Parameters 

In this section, we present several sampling schemes using various measures of connectivity. Some of these 
results were previously known; however, we will show that these results follow as simple corollaries of our 
generic sampling scheme whereas the original proofs were specific to each scheme and substantially more 
complicated. The algorithms for implementing these schemes are less efficient than the algorithms that we 
have previously presented; therefore we restrict ourselves to structural results in this section. As earlier, G is 
the weighted input graph (with arbitrary integer weights); Gm is the corresponding unweighted multigraph; 
T\ , T2, . . . , Tk is a set of NI forests of Gm- 

5.1 Sampling using Standard Connectivities 

For any edge e = (u,v), set X e to the standard connectivity of the edge; also set a = 3 + \gn and % = 2'~ l . F, 
is defined as the set of all edges e with 2' < X e < 2' +l — 1 for any i > 0. For any i > 1 + lgn, let G ( contain all 
edges in NI forests 7#-i-i gn , r 2 /-i-io g « +1 , . . . , F 2 i+i_i and all edges in Fj. For i < \gn, G,- contains all edges in 
T\ , T2, . . . , Tj and all edges in F,-. For any i > 0, let F) denote the set of edges in G,- but not in Fj. For any i ^ j, 
F{ n Fj = and each edge appears in Y{ for at most 2 + log n different values of i; this proves a-overlap. To 
prove 7r-connectivity, we note that Lemma Q] ensures that for any pair of vertices u,v with maximum flow 
f(u,v) and for any k > 1, u,v are at least min(/(«, v),&) -connected in the union of the first k NI forests, i.e. 
in T\ U T2 U . . . Tk. Thus, any edge e G F; is at least 2' -heavy in the union of the NI forests T\,T%, . . . , r 2 i+i_i. 
Since there are at most 2 !_1 edges overall in T\,T%, . . . , r 2 ;-i-i gn _ 1 , any edge e € F is 2'~ 1 -heavy in G,. This 
proves 7T-connectivity. 

We now prove the size bound. The next lemma is similar to its corresponding lemma for strong connec- 
tivity in (H. 

Lemma 7. Suppose G is an undirected graph where edge e has weight w e and standard connectivity k e . 
Then, I e ^<n-1. 
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Proof. We use induction on the number of vertices in the graph. For a graph with a single vertex and no 
edge, the lemma holds vacuously. Now, suppose the lemma holds for all graphs with at most n — 1 vertices. 
Let C be a minimum cut in G, and let X be its weight. For any edge e E C, k e = A. Thus, Y^eeC 7 s = 1- 
We remove all edges in C from G; this splits G into two connected components G\ and G2 with n\ and n 2 
vertices respectively, where n\,n%<n — \. Further, the standard connectivity of each edge in G\,G2 is at 
most that in G. Using the inductive hypothesis, we conclude that ^eed j~ — n i ~ 1 an ^ LeeG 2 T — n 2 ~ 1- 
We conclude that 

YjT <«i-1+«2- 1 + 1 =n-l. 

e Kg 

□ 

Using Theorem [U we conclude that the expected number of edges in the skeleton graph G e is 0( " lo f - ) and 
G e G (lie)Gwhp. 

5.2 Sampling using Effective Resistances 

For any edge e = (u,v), set X e to the effective conductance of the edge, i.e. X e = 4- where R e is the effective 
resistance of edge e. The next two lemmas imply that the skeleton G e G (1 ± e)G whp. 

Lemma 8. Suppose that a sampling scheme (that uses the generic sampling scheme) has X e < k e for each 
edge e in graph G, where k e is the standard connectivity of e in G. Then, the skeleton constructed is in 
(l±e)Gwhp. 

Proof. We use the same definition of a, % and G,-s as in the sampling scheme with standard connectivities, 
and verify that 7i-connectivity and a-overlap continue to be satisfied. □ 

Lemma 9. Suppose edge e in an undirected graph G has standard connectivity k e and effective resistance 
R e . Then, 4- < k e . 

Proof. Consider a cut C of weight k e separating the terminals of edge e. We contract each side of this cut 
into a single vertex. In other words, we reduce the resistance on each edge, other than those in C, to 0. 
By Rayleigh's monotonicity principle (e.g. [4)), the effective resistance of e does not increase due to this 
transformation. Since the effective resistance of e after the transformation is l/k e , R e > l/k e in the original 
graph. □ 

The size bound follows from the following well-known fact (see e.g. l22l )FI 

Fact 1. IfR e is the effective resistance of edge e with weight w e in an undirected graph, then £ e w e R e <n—l. 

It follows from Theorem [8] that the expected number of edges in skeleton G e is 0( wl °f - )■ 

5.3 Sampling using Strong Connectivities 

For any edge e, set X e to the strong connectivity of the edge; set a = 1 and % = 2 k , where k = [lgmax e€ E{X e }\ ■ 
Let Gj contain all edges in F r for all r > i, where each edge in F r is replicated 2 k ~ r times. We use the fol- 
lowing property of strong connectivities that also appears in Q. 

7 There are many proofs of this fact, e.g. use linearity of expectation coupled with the fact that effective resistance of an edge is 
the probability that the edge is in a random spanning tree of the graph 1 3 1. 
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Lemma 10. In any undirected graph G, if an edge e has strong connectivity k, then e continues to have 
strong connectivity k even after all edges with strong connectivity strictly less than k have been removed 
from G. 

Consider any cut C with an edge e G Ff Let the corresponding cut (i.e. with the same bi-partition of vertices) 
in Gi be Q. We need to show that the number of edges in C, is at least 2 k to prove ^-connectivity. Let the 
maximum strong connectivity of an edge in C be kc, where 2 J < kc < 2 ;+1 — 1 for some j > i. By the above 
lemma, C; contains at least V distinct edges of G, each of which is replicated at least 2 ■* times. Thus, C; 
contains at least 2 k edges. 

(c) (c) 

We now prove a-overlap. For any cut C, recall that f> and e) respectively denote the number of 
edges in F; n C and in C, (where Q is as defined in the previous paragraph) respectively. Then, 

k _(C)oi-l k k AC) k-r^i-l k k AC) k r AC) k v r i k , 

L e j L = V V Jr - V V Jr — V V Jr — V f ^ Y <? V f ( c > = r 

n i-i l-i <yk or-i+1 L* Jr nr-i+\ I-i Jr 

i=0 JL i=Qr=i i=0r=i^ r =0i=0^ r=0 i=0 ^ r=0 

The size bound follows from the following lemma due to Benczur and Karger. 

Lemma 11 (Benczur-Karger O). Ifk e is the strong connectivity of edge e with weight w e in an undirected 
graph, then Y^eY — n ~ 1- 

It follows from Theorem [8] that the expected number of edges in the skeleton graph G e is 0( nlo f " ) and that 
G £ g (lie)Gwhp. 



6 Sampling in Graphs with Arbitrary Edge Weights 

Unfortunately, the algorithms presented earlier for sampling in a graph with polynomial edge weights fail if 
the edge weights are arbitrary. In particular, we can no longer guarantee that the expected number of edges 
in a skeleton graph constructed by these algorithms is 0(n/e 2 ), even though it continues to approximately 
preserve the weight of all cuts whp. Therefore, we need to modify our techniques to restore the size bounds, 
as described below. 

We sort the edges in decreasing order of their weight, breaking ties arbitrarily. We add edges to the NI 
forests in this sorted order, i.e. when edge e is being added, the NI forests contain all edges of weight greater 
than e. To insert e = (u,v), we find the NI forest with the minimum index where u and v are not connected; 
call this index i e . Then, e is inserted in NI forests 7} c , 7] e+ i , . . . , 7} e+HV _ i . Note that this does not produce any 
cycle in the NI forests since Lemma[T]ensures that if u,v are disconnected in T ie , then they are not connected 
in 7jt for any k>i e . 

For any edge e = (u,v), set X e to the index of the first NI forest where edge e is inserted, i.e. X e = i e ; 
also set a = 2 and n = 2'~ l . For any i > 1, let Gj contain all edges in NI forests T 2 t-\ , T 2 i-i + \ T^-x ( cai l 
this set of edges Yf) and all edges in Fj, i.e. all edges e with 2 l < X e < 2 I+1 — 1. Let Go = Fq. For any i ^ j, 
Fi n Fj = Yj n Yj = 0; thus, each edge appears in G, for at most two different values of i, proving a-overlap. 
On the other hand, for any edge e G F^ Lemma Q] ensures that the endpoints of e are connected in each of 
T 2 i-i , T 2 t-i + i, ■ ■ ■ , T 2 i_ 1 . It follows that e is 2 i_1 -heavy in G,-, thereby proving 7r-connectivity. 

We now prove the size bound on the skeleton. Partition edges into subsets So, Si,... where Sj contains 
all edges e with j < ^- < j + 1 . The following lemma states that none of these subsets is large. 

Lemma 12. For any j, \Sj\ < n — 1. 
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Proof. We prove that the edges in any subset Sj form an acyclic graph. Suppose not; let C be a cycle formed 
by the edge in Sj, and e = (m,v) be the edge that was inserted last in the NI forests among the edges in C. 
Let e' be any other edge in C. Then, wy > w e , and hence 

V+wy-1 > We>(j+ 1) - 1 >w e (j+l)-l >i e -l. 

Since both the first and last terms are integers, i e < +w e i — 1 > i e . Therefore, u' and V were connected in 
for each e' = (u',v') in C. So, u and v were connected in 7] e since C is a cycle, before e was added to 7} e . 
But, then e would not have been added to 7] e , a contradiction. □ 

Thus, 

£ M <(„_!) £ I=0(nlogn) 

e '' j:S ,0 •/ ;:V . 0-/ 

since at most m < of the S/s are non-empty. Using Theorem [U we conclude that the skeleton G e has 
Q^ni^rn^ edges in eX p ectat j on an( i G e G (1 ± £)G whp. 

Finally, we need to show that the construction of NI forests where edges are added in decreasing order 
of weight can be done in 0(mlog 2 n) time. We use a data structure (call it a partition tree) & to succinctly 
encode the NI forests. The leaf nodes in @* exactly correspond to the vertices in graph G, i.e. there is a 
one-one mapping between these two sets. On the other hand, each non-leaf node v of the partition tree has 
a number n(y) associated with it that satisfies the following property: for any two vertices x,y in the graph, 
if z be the least common ancestoj^ of their corresponding leaf nodes in P, then x and y are connected in 
exactly the first n(z) NI forests. Then, n{z) + 1 is the index of the first NI forest where edge (x,y) is to be 
inserted. Initially, all the n leaf nodes in representing the graph vertices are children of the root node r, 
and n[r) = 0. As edges are inserted in the NI forests, the partition tree evolves, but we make sure that the 
above property holds throughout the construction. Additionally, we also maintain the invariant that if x is a 
child of y in then n(x) > n(y). 

We need to show that we can maintain the above properties of the partition tree as it evolves, and also 
retrieve the lea of any pair of vertices efficiently for this evolving partition tree. Let (x,y) be the edge being 
inserted, let z = lca[x,y) in the partition tree, and let u and v be the children of z that are ancestors of x and 
y respectively. Observe that adding an edge (x,y) to trees with indices from n s + 1 to n s + £ increases the 
connectivity of a pair of vertices W\,W2 iff they were previously connected in n s + i trees for some < i < I, 
w\,x were connected in n s + j trees for some j > i and W2,y were connected in n s + k trees for some k > i 
(or vice-versa). In this case, W\,W2 are now connected in n s +min(j,k,£) trees after adding the edge (x,y). 
Further, if n(u) — n{z) < w(x,y), then an edge of weight less than w(x,y) must have been added to the trees 
according to the second invariant, which violates the fact that edges are added in decreasing order of weight. 
Thus, n(u) — n{z) > w(x,y); similarly n(v) — n(z) > w(x,y). 

There are three cases: 

1. n(u) — n(z) = n(v) — n(z) = w(x,y). We merge u and v into a single node s that remains a child of 
z and n{s) = n{u). The first invariant is clearly maintained. For the second invariant, observe that 
the only pairs of vertices w\,W2 whose connectivity changed were those with lca(wi,W2) = z, where 
w\,W2 are descendants of u,v respectively. Their connectivity increases to n(u), which is reflected in 
the partition tree. 

8 The least common ancestor or lea of two nodes x,y in a tree is the deepest node that is an ancestor of both x and y. 
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2. n(u) — n(z) = w(x,y) and n(v) — n(z) > w(x,y) (symmetrically for n(u) —n(z) > w(x,y) and n(v) — 
n(z) = w(x,y)). We make v a child of u (from being a child of z), and n(u) = n{z) +w(x,y). For 
notational convenience in the proofs later, we replace u and v by a pair of new nodes s and t where 
n(s) and n(t) are respectively equal to the updated values of n(u) and n(v). The first invariant is clearly 
maintained. For the second invariant, observe that the only pairs of vertices w\,W2 whose connectivity 
changed were those with lca(w\,W2) = z, where W\,W2 are descendants of u,v respectively. Their 
connectivity increases to n(z) + w(x,y), which is reflected in the partition tree. 

3. n(u) — n(z) > w(x,y) and n(v) — n(z) > w(x,y). We introduce anew node r as a child of z and parent of 
u and v, and n(r) = n(z) + w(x,y). For notational convenience in the proofs later, we replace u and v by 
a pair of new nodes s and t where n(s) = n(u) and n(t) = n(y). The first invariant is clearly maintained. 
For the second invariant, observe that the only pairs of vertices w\ : W2 whose connectivity changed 
were those with lca(w\ , W2) =z, where w\ , W2 are descendants of u, v respectively. Their connectivity 
increases to n(z) + w(x,y), which is reflected in the partition tree. 

We use the dynamic tree data structure [21] for updating the partition tree. This data structure can be 
used to maintain a dynamically changing forest of n nodes, while supporting the following operation^] in 
0(log«) time per operation: 

Cut(v) Cut the subtree under node v from the tree containing it, and make it a separate tree with root v. 

Link(v,w) (w needs to be the root node of a tree not containing v.) Join the tree rooted at w and that 
containing v by making w a child of v. 

LCA(v, w) (v and w need to be in the same tree.) Defined previously. 

We maintain a dynamic tree data structure for the partition tree. Recall that the partition tree can be modified 
in three different ways. The last two modifications require 0(1) cut and link operations each. Therefore, 
the overall time complexity of these modifications is 0(mlogn). On the other hand, the first modification 
requires 0(d) cut and link operations, where d is the lesser number of children among u and v. We will 
prove the following lemma bounding the total number of operations due to the first type of modification. 

Lemma 13. The total number of cut and link operations due to modifications of the first type in the partition 
tree is 0(m\ogn). 

Theorem [10] follows immediately. 

Theorem 10. The time complexity of constructing NI forests where edges are inserted in decreasing order 
of weight is 0(mlog 2 n) for graphs with arbitrary edge weights. 

We now prove Lemma [T3l 

Proof of Lemma\L3\ We set up a charging argument for the cut and link operations due to the first type 
of modification. Define a function / on the nodes of the partition tree where each node v has f(v) = 1 
initially. In the first type of modification, we assign f(s) = f(u) +/(v); in the second type of modification, 
f( s ) = f( u )+f(v) and f(t) = 1; in the third type of modification, f(r) = f(u) +f(v) and f(s) = f(t) = 1. 
Observe that the sum of /(•) over all nodes in the partition tree increases by at most 2 for any of the above 
modifications. 

9 The dynamic tree data structure supports other operations as well; we only define the operations that we require. 



18 



Let C u be the set of children of node u; then, let Fc{u) = Lvgc„ /( v )- We charge the cut and link 
operations for the first type of modification to the children of u (resp., v) if Fc(u) > Fq{v) (resp., Fq{v) > 
Fc(u))\ each child of u (resp., v) is charged 0(1) operations. Now, let S u be the set of siblings of any node u 
in the partition tree; correspondingly, let Fs(u) = Lves„ /(v). Observe that whenever a node u is charged due 
the first type of modification, F${u) at least doubles. Further, F${u) never decreases for any node u due to 
any of the three types of modifications. Since the sum of /(.) over all nodes in the partition tree increases by 
at most 2 for any of the modifications, and there are m modifications overall, each node is charged at most 
O(logm) = O(logra) times. Further, each modification introduces 0(1) new nodes; so the total number of 
operations due to modifications of the first type is 0(mlog«). □ 
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We can verify that g(x) is an increasing function of x for x £ (0, 1]. Further, at x = 1, g{x) = a. Thus, 
f(x) < ax 2 forx£ (0,1). 



A Proof of Theorem[9] 



We need the following inequality. 



Lemma 14. Let f(x) =x — (1 + x)ln(l+.x) and a = 1 — 21n2. Then, 




Proof. First, consider x £ (0, 1). Define 
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Now, consider x > 1 . Define 



h(x) = — = 1 - ( 1 + - ] ln(l +x 



We can verify that h(x) is a decreasing function of x for x > 1. Further, at x = 1, /i(x) = a. Thus, f(x) < ax 
forx>l. □ 

We use the above inequality to prove the following lemmas. 

Lemma 15. Suppose Xi,X2, . . . ,X n is a set of independent random variables such that each X,-, i € {1,2, . .. ,n}, 
has value 1 //?, wz'f/j probability pifor some fixed < pi < 1 a«<i /las va/we w/?/i probability 1 — /?,-. For any 
p < min, /?; and for any e > 0, 



£x ; > (l + e)n 



< 



e -°-38e> if < £ < 1 
e -Q3%epn jf e >l. 



Proo/ For any i > 00 



£x, >(! + £) 



< 



P > e t{l+ ^ n 



n 



a t(\+e)n 



" E[i 



l g f(l+e)n 
» p.j/Pt + 1-p. 



(by Markov bound (see e.g. [T7])) 
(by independence of Xi ,X2, . . . ,X n ) 



i=\ 



P t(l+e)n 



" l+Piie'/P'-l) 



i=\ 



„t(\+e)n 



< exp(£/? ; (e' /p ' -l)-f(l + e)n) (since 1 + x < e x , Vx > 0). 



Since pi > p for all i G {1,2, ... ,n}, 



2>Ke f/p ' - 1)) < Y.W lp - 1)) = - 1). 



tip. 



j i p. 



i=\ 



Thus, 



£x,->(l + e) 



<exp(wp(/ /p -l)-f(l + e)«). 



Setting ? = pln(l + e), we get 



J\X/>(l + e)n 



< 



(1+e) 



l+e 



/"' 



-'For any random variable X, E[X] denotes the expectation of X. 
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Since 1 — 21n2 < —0.38, we can use Lemma [T4l to conclude that 

' e -0.38e> if < £ < 1 



L*»- >(l + e) 



< 



e -03Sepn jf e>1 . 



□ 



Lemma 16. Suppose X\,X2, ■ ■ ■ ,X n is a set of independent random variables such that each Xj, i G {1,2, . . . ,n}, 
has value 1 jpi with probability pifor some fixed </?,■< 1 and has value with probability 1 — /?,-. For any 
p < min, pi and for any e > 0, 



<(l-e) 



< e -0- 5e -P» if < £ < 1 

= ife>l. 



Proof. For e > 1, 



P £x,<(l-e) 
Now, suppose £6 (0,1). For any t > 0, 



< 



£x,-<o 



0. 



< 



n 

i=i 



,-f(l-e)n 



,-t(l-e)n 



(by Markov bound) 
(by independence of X±,X2, . . . ,X n ) 



n p. e -t/p i + 1 _ p . 



,-f(l-e)n 



11 e -»(l-e> 

n 

< exp(£-Pi(e~ t/p ' -l) + t(\-s)n) (since 1 -x < e~ x , Vx > 0). 

i=l 

Since /?,• > for all / G {1,2, ... ,n}, 

f>(l -e"*/")) < tiP^- e ~ (/P ))=np(\-e- { /n- 

Thus, 



7=1 



(=1 



P £X;<(l-e) 
Setting f = —pln(l — e), we get 

P £X ; - <(!-£)« 



< exp(«p(l - e~ t/p ) + t(l - e) 



< 



(l-e 



,1-e 



< ^-0.5e 2 pn_ 



□ 
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We now prove Theorem [9] using the above lemmas. 



Proof of Theorem® Let 8 = |§f. First, consider the case where 8 G (0, 1). From Lemmas [T5l and [T6l we 
conclude that 



Y,X e -\C\\ >e\C\ 



\Y, X e-\C\\ > 8\C\ 



< 2(? -0.385 2 p|C| 



= 2e-°- 38e2 ^( /v /l c l) < 2e-° 3S£l P N (since N > |C| 
Now, consider the case where 8 > 1. From Lemmas [T5l and [T6l we conclude that 



Y,X e -\C\\>£N =P |£x e -|C|| > <5|C 



< g -0.385p|C| _ e -Q.3SepN < g -0.38e 2 pW 



(since a < 1). 

□ 
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